Elo Rating
   HOME

TheInfoList



OR:

The Elo rating system is a method for calculating the relative skill levels of players in
zero-sum game Zero-sum game is a mathematical representation in game theory and economic theory of a situation which involves two sides, where the result is an advantage for one side and an equivalent loss for the other. In other words, player one's gain is e ...
s such as
chess Chess is a board game for two players, called White and Black, each controlling an army of chess pieces in their color, with the objective to checkmate the opponent's king. It is sometimes called international chess or Western chess to disti ...
. It is named after its creator
Arpad Elo Arpad Emmerich Elo ( Élő Árpád Imre; August 25, 1903 – November 5, 1992) was a Hungarian-American physics professor who created the Elo rating system for two-player games such as chess. Born in Egyházaskesző, Kingdom of Hungary, ...
, a Hungarian-American physics professor. The Elo system was invented as an improved chess-rating system over the previously used Harkness system, but is also used as a rating system in
association football Association football, more commonly known as football or soccer, is a team sport played between two teams of 11 players who primarily use their feet to propel the ball around a rectangular field called a pitch. The objective of the game is ...
,
American football American football (referred to simply as football in the United States and Canada), also known as gridiron, is a team sport played by two teams of eleven players on a rectangular field with goalposts at each end. The offense, the team with ...
,
baseball Baseball is a bat-and-ball sport played between two teams of nine players each, taking turns batting and fielding. The game occurs over the course of several plays, with each play generally beginning when a player on the fielding tea ...
,
basketball Basketball is a team sport in which two teams, most commonly of five players each, opposing one another on a rectangular Basketball court, court, compete with the primary objective of #Shooting, shooting a basketball (ball), basketball (appr ...
,
pool Pool may refer to: Water pool * Swimming pool, usually an artificial structure containing a large body of water intended for swimming * Reflecting pool, a shallow pool designed to reflect a structure and its surroundings * Tide pool, a rocky po ...
,
table tennis Table tennis, also known as ping-pong and whiff-whaff, is a sport in which two or four players hit a lightweight ball, also known as the ping-pong ball, back and forth across a table using small solid rackets. It takes place on a hard table div ...
, and various
board game Board games are tabletop games that typically use . These pieces are moved or placed on a pre-marked board (playing surface) and often include elements of table, card, role-playing, and miniatures games as well. Many board games feature a co ...
s and esports. The difference in the ratings between two players serves as a predictor of the outcome of a match. Two players with equal ratings who play against each other are expected to score an equal number of wins. A player whose rating is 100 points greater than their opponent's is expected to score 64%; if the difference is 200 points, then the expected score for the stronger player is 76%. A player's Elo rating is represented by a number which may change depending on the outcome of rated games played. After every game, the winning player takes points from the losing one. The difference between the ratings of the winner and loser determines the total number of points gained or lost after a game. If the higher-rated player wins, then only a few rating points will be taken from the lower-rated player. However, if the lower-rated player scores an upset win, many rating points will be transferred. The lower-rated player will also gain a few points from the higher rated player in the event of a draw. This means that this rating system is self-correcting. Players whose ratings are too low or too high should, in the long run, do better or worse correspondingly than the rating system predicts and thus gain or lose rating points until the ratings reflect their true playing strength. Elo ratings are comparative only, and are valid only within the rating pool in which they were calculated, rather than being an absolute measure of a player's strength.


History

Arpad Elo was a master-level chess player and an active participant in the
United States Chess Federation The United States Chess Federation (also known as US Chess or USCF) is the governing body for chess competition in the United States and represents the U.S. in FIDE, the World Chess Federation. US Chess administers the official national rating ...
(USCF) from its founding in 1939. The USCF used a numerical ratings system, devised by
Kenneth Harkness Kenneth Harkness (byname of Stanley Edgar; November 12, 1896 – October 4, 1972) was a chess organizer. He is the creator of the Harkness rating system. Life and career He was born in Glasgow, Scotland. He was Business Manager of the United Sta ...
, to allow members to track their individual progress in terms other than tournament wins and losses. The Harkness system was reasonably fair, but in some circumstances gave rise to ratings which many observers considered inaccurate. On behalf of the USCF, Elo devised a new system with a more sound statistical basis. At about the same time, György Karoly and Roger Cook independently developed a system based on the same principles for the New South Wales Chess Association. Elo's system replaced earlier systems of competitive rewards with a system based on statistical estimation. Rating systems for many sports award points in accordance with subjective evaluations of the 'greatness' of certain achievements. For example, winning an important
golf Golf is a club-and-ball sport in which players use various clubs to hit balls into a series of holes on a course in as few strokes as possible. Golf, unlike most ball games, cannot and does not use a standardized playing area, and coping wi ...
tournament might be worth an arbitrarily chosen five times as many points as winning a lesser tournament. A statistical endeavor, by contrast, uses a model that relates the game results to underlying variables representing the ability of each player. Elo's central assumption was that the chess performance of each player in each game is a normally distributed
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
. Although a player might perform significantly better or worse from one game to the next, Elo assumed that the mean value of the performances of any given player changes only slowly over time. Elo thought of a player's true skill as the mean of that player's performance random variable. A further assumption is necessary because chess performance in the above sense is still not measurable. One cannot look at a sequence of moves and derive a number to represent that player's skill. Performance can only be inferred from wins, draws and losses. Therefore, if a player wins a game, they are assumed to have performed at a higher level than their opponent for that game. Conversely, if the player loses, they are assumed to have performed at a lower level. If the game is a draw, the two players are assumed to have performed at nearly the same level. Elo did not specify exactly how close two performances ought to be to result in a draw as opposed to a win or loss. Actually, there is a probability of a draw that is dependent on the performance differential, so this latter is more of a confidence interval than any deterministic frontier. And while he thought it was likely that players might have different standard deviations to their performances, he made a simplifying assumption to the contrary. To simplify computation even further, Elo proposed a straightforward method of estimating the variables in his model (i.e., the true skill of each player). One could calculate relatively easily from tables how many games players would be expected to win based on comparisons of their ratings to those of their opponents. The ratings of a player who won more games than expected would be adjusted upward, while those of a player who won fewer than expected would be adjusted downward. Moreover, that adjustment was to be in linear proportion to the number of wins by which the player had exceeded or fallen short of their expected number. From a modern perspective, Elo's simplifying assumptions are not necessary because computing power is inexpensive and widely available. Several people, most notably
Mark Glickman The Glicko rating system and Glicko-2 rating system are methods of assessing a player's strength in games of skill, such as chess and Go. The Glicko rating system was invented by Mark Glickman in 1995 as an improvement on the Elo rating system, ...
, have proposed using more sophisticated statistical machinery to estimate the same variables. On the other hand, the computational simplicity of the Elo system has proven to be one of its greatest assets. With the aid of a pocket calculator, an informed chess competitor can calculate to within one point what their next officially published rating will be, which helps promote a perception that the ratings are fair.


Implementing Elo's scheme

The USCF implemented Elo's suggestions in 1960, and the system quickly gained recognition as being both fairer and more accurate than the
Harkness rating system A chess rating system is a system used in chess to estimate the strength of a player, based on their performance versus other players. They are used by organizations such as FIDE, the US Chess Federation (USCF or US Chess), International Correspo ...
. Elo's system was adopted by the
World Chess Federation The International Chess Federation or World Chess Federation, commonly referred to by its French acronym FIDE ( Fédération Internationale des Échecs), is an international organization based in Switzerland that connects the various national c ...
(FIDE) in 1970. Elo described his work in detail in ''The Rating of Chessplayers, Past and Present'', first published in 1978.Elo 1986. Subsequent statistical tests have suggested that chess performance is almost certainly not distributed as a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
, as weaker players have greater winning chances than Elo's model predicts. In practice, there is little difference between the shape of the logistic and normal curve. So it does not matter whether the logistic or normal distribution is used to calculate the expected scores. Mathematically, however, the logistic function is more convenient to work with. FIDE continues to use the rating difference table as proposed by Elo. The development of the Percentage Expectancy Table (table 2.11) is described in more detail by Elo as follows:
The normal probabilities may be taken directly from the standard tables of the areas under the normal curve when the difference in rating is expressed as a z score. Since the standard deviation σ of individual performances is defined as 200 points, the standard deviation σ' of the differences in performances becomes σ√2 or 282.84. The z value of a difference then is D/282.84. This will then divide the area under the curve into two parts, the larger giving P for the higher rated player and the smaller giving P for the lower rated player. For example, let D = 160. Then z = 160/282.84 = .566. The table gives .7143 and .2857 as the areas of the two portions under the curve. These probabilities are rounded to two figures in table 2.11.
The table is actually built with standard deviation 2000/7 as an approximation for 200√2. The normal and logistic distributions are, in a way, arbitrary points in a spectrum of distributions which would work well. In practice, both of these distributions work very well for a number of different games.


Different ratings systems

The phrase "Elo rating" is often used to mean a player's chess rating as calculated by FIDE. However, this usage may be confusing or misleading because Elo's general ideas have been adopted by many organizations, including the USCF (before FIDE), many other national chess federations, the short-lived
Professional Chess Association The Professional Chess Association (PCA), which existed between 1993 and 1996, was a rival organisation to FIDE, the international chess organization. The PCA was created in 1993 by Garry Kasparov and Nigel Short for the marketing and organization o ...
(PCA), and online chess servers including the
Internet Chess Club The Internet Chess Club (ICC) is a commercial Internet chess server devoted to the play and discussion of chess and chess variants. ICC had over 30,000 subscribing members in 2005.John Black, Martin Cochran, Martin Ryan Gardner"Lessons Learned ...
(ICC),
Free Internet Chess Server The Free Internet Chess Server (FICS) is a volunteer-run Internet chess server. It was organised as a free alternative to the Internet Chess Club (ICC), after that site began charging for membership. History The first Internet chess server, ...
(FICS), and
Yahoo! Yahoo! (, styled yahoo''!'' in its logo) is an American web services provider. It is headquartered in Sunnyvale, California and operated by the namesake company Yahoo Inc., which is 90% owned by investment funds managed by Apollo Global Man ...
Games. Each organization has a unique implementation, and none of them follows Elo's original suggestions precisely. Instead one may refer to the organization granting the rating. For example: "As of August 2002,
Gregory Kaidanov Gregory Kaidanov (russian: Григорий Зиновьевич Кайда́нов, ; born 11 October 1959) is a Soviet-born American chess grandmaster. He was inducted into the United States Chess Hall of Fame in 2013. His peak rating is 2646 ...
had a FIDE rating of 2638 and a USCF rating of 2742." The Elo ratings of these various organizations are not always directly comparable, since Elo ratings measure the results within a closed pool of players rather than absolute skill.


FIDE ratings

For top players, the most important rating is their
FIDE The International Chess Federation or World Chess Federation, commonly referred to by its French acronym FIDE ( Fédération Internationale des Échecs), is an international organization based in Switzerland that connects the various national c ...
rating. FIDE has issued the following lists: * From 1971 to 1980, one list a year was issued. * From 1981 to 2000, two lists a year were issued, in January and July. * From July 2000 to July 2009, four lists a year were issued, at the start of January, April, July and October. * From July 2009 to July 2012, six lists a year were issued, at the start of January, March, May, July, September and November. * Since July 2012, the list has been updated monthly. The following analysis of the July 2015 FIDE rating list gives a rough impression of what a given FIDE rating means in terms of world ranking: * 5,323 players had an active rating in the range 2200 to 2299, which is usually associated with the
Candidate Master FIDE titles are awarded by the international chess governing body FIDE (''Fédération Internationale des Échecs'') for outstanding performance. The highest such title is Grandmaster (GM). Titles generally require a combination of Elo rating and ...
title. * 2,869 players had an active rating in the range 2300 to 2399, which is usually associated with the
FIDE Master FIDE titles are awarded by the international chess governing body FIDE (''Fédération Internationale des Échecs'') for outstanding performance. The highest such title is Grandmaster (GM). Titles generally require a combination of Elo rating an ...
title. * 1,420 players had an active rating between 2400 and 2499, most of whom had either the
International Master FIDE titles are awarded by the international chess governing body FIDE (''Fédération Internationale des Échecs'') for outstanding performance. The highest such title is Grandmaster (GM). Titles generally require a combination of Elo rating and ...
or the International Grandmaster title. * 542 players had an active rating between 2500 and 2599, most of whom had the International Grandmaster title. * 187 players had an active rating between 2600 and 2699, all of whom had the International Grandmaster title. * 40 players had an active rating between 2700 and 2799. * 4 players had an active rating of over 2800. (
Magnus Carlsen Sven Magnus Øen Carlsen (born 30 November 1990) is a Norwegian chess grandmaster who is the reigning five-time World Chess Champion. He is also a three-time World Rapid Chess Champion and five-time World Blitz Chess Champion. Carlsen has h ...
was rated 2853, and 3 players were rated between 2814 and 2816). The highest ever FIDE rating was 2882, which
Magnus Carlsen Sven Magnus Øen Carlsen (born 30 November 1990) is a Norwegian chess grandmaster who is the reigning five-time World Chess Champion. He is also a three-time World Rapid Chess Champion and five-time World Blitz Chess Champion. Carlsen has h ...
had on the May 2014 list. A list of the highest-rated players ever is at
Comparison of top chess players throughout history Several methods have been suggested for comparing the greatest chess players in history. There is agreement on a statistical system to rate the strengths of current players, called the Elo system, but disagreement about methods used to compare pl ...
.


Performance rating

Performance rating or special rating is a hypothetical rating that would result from the games of a single event only. Some chess organizations use the "algorithm of 400" to calculate performance rating. According to this algorithm, performance rating for an event is calculated in the following way: # For each win, add your opponent's rating plus 400, # For each loss, add your opponent's rating minus 400, # And divide this sum by the number of played games. Example: 2 wins (opponents ''w & x)'', 2 losses (opponents ''y & z)''
: \begin & \frac \\ pt& \frac \end This can be expressed by the following formula: : \text = \frac Example: If you beat a player with an Elo rating of 1000, : \text = \frac = 1400 If you beat two players with Elo ratings of 1000, : \text = \frac = 1400 If you draw, : \text = \frac = 1000 This is a simplification, but it offers an easy way to get an estimate of PR (performance rating).
FIDE The International Chess Federation or World Chess Federation, commonly referred to by its French acronym FIDE ( Fédération Internationale des Échecs), is an international organization based in Switzerland that connects the various national c ...
, however, calculates performance rating by means of the formula: Opponents' Rating Average + Rating Difference. Rating Difference d_p is based on a player's tournament percentage score p, which is then used as the key in a lookup table where p is simply the number of points scored divided by the number of games played. Note that, in case of a perfect or no score d_p is 800. The full table can be found in th
Manual de la FIDE, B. Permanent Commissions, 02. FIDE Rating Regulations (Qualification Commission), FIDE Rating Regulations effective from 1 July 2017, 8.1a
online. A simplified version of this table is on the right.


Live ratings

FIDE The International Chess Federation or World Chess Federation, commonly referred to by its French acronym FIDE ( Fédération Internationale des Échecs), is an international organization based in Switzerland that connects the various national c ...
updates its ratings list at the beginning of each month. In contrast, the unofficial "Live ratings" calculate the change in players' ratings after every game. These Live ratings are based on the previously published FIDE ratings, so a player's Live rating is intended to correspond to what the FIDE rating would be if FIDE were to issue a new list that day. Although Live ratings are unofficial, interest arose in Live ratings in August/September 2008 when five different players took the "Live" No. 1 ranking. The unofficial live ratings of players over 2700 were published and maintained by Hans Arild Runde a
the Live Rating website
until August 2011. Another website
2700chess.com
has been maintained since May 2011 by Artiom Tsepotan, which covers the top 100 players as well as the top 50 female players. Rating changes can be calculated manually by using the FIDE ratings change calculator. All top players have a K-factor of 10, which means that the maximum ratings change from a single game is a little less than 10 points.


United States Chess Federation ratings

The
United States Chess Federation The United States Chess Federation (also known as US Chess or USCF) is the governing body for chess competition in the United States and represents the U.S. in FIDE, the World Chess Federation. US Chess administers the official national rating ...
(USCF) uses its own classification of players: *2400 and above: Senior Master *2200–2399: National Master **2200–2399 plus 300 games above 2200: Original Life Master *2000–2199: Expert or Candidate Master *1800–1999: Class A *1600–1799: Class B *1400–1599: Class C *1200–1399: Class D *1000–1199: Class E *800–999: Class F *600–799: Class G *400–599: Class H *200–399: Class I *100–199: Class J


The K-factor used by the USCF

The ''K-factor'', in the USCF rating system, can be estimated by dividing 800 by the effective number of games a player's rating is based on (''Ne'') plus the number of games the player completed in a tournament (m). : K = \frac \,


Rating floors

The USCF maintains an absolute rating floor of 100 for all ratings. Thus, no member can have a rating below 100, no matter their performance at USCF-sanctioned events. However, players can have higher individual absolute rating floors, calculated using the following formula: :AF = \operatorname\ where N_W is the number of rated games won, N_D is the number of rated games drawn, and N_R is the number of events in which the player completed three or more rated games. Higher rating floors exist for experienced players who have achieved significant ratings. Such higher rating floors exist, starting at ratings of 1200 in 100-point increments up to 2100 (1200, 1300, 1400, ..., 2100). A rating floor is calculated by taking the player's peak established rating, subtracting 200 points, and then rounding down to the nearest rating floor. For example, a player who has reached a peak rating of 1464 would have a rating floor of 1464 − 200 = 1264, which would be rounded down to 1200. Under this scheme, only Class C players and above are capable of having a higher rating floor than their absolute player rating. All other players would have a floor of at most 150. There are two ways to achieve higher rating floors other than under the standard scheme presented above. If a player has achieved the rating of Original Life Master, their rating floor is set at 2200. The achievement of this title is unique in that no other recognized USCF title will result in a new floor. For players with ratings below 2000, winning a cash prize of $2,000 or more raises that player's rating floor to the closest 100-point level that would have disqualified the player for participation in the tournament. For example, if a player won $4,000 in a 1750-and-under tournament, they would now have a rating floor of 1800.


Theory

Pairwise comparison Pairwise comparison generally is any process of comparing entities in pairs to judge which of each entity is preferred, or has a greater amount of some quantitative property, or whether or not the two entities are identical. The method of pairwis ...
s form the basis of the Elo rating methodology. Elo made references to the papers of Good, David, Trawinski and David, and Buhlman and Huber.


Mathematical details

Performance is not measured absolutely; it is inferred from wins, losses, and draws against other players. Players' ratings depend on the ratings of their opponents and the results scored against them. The difference in rating between two players determines an estimate for the expected score between them. Both the average and the spread of ratings can be arbitrarily chosen. The USCF initially aimed for an average club player to have a rating of 1500 and Elo suggested scaling ratings so that a difference of 200 rating points in chess would mean that the stronger player has an ''expected score'' (basically an expected average score) of approximately 0.75. A player's ''expected score'' is their probability of winning plus half their probability of drawing. Thus, an expected score of 0.75 could represent a 75% chance of winning, 25% chance of losing, and 0% chance of drawing. On the other extreme it could represent a 50% chance of winning, 0% chance of losing, and 50% chance of drawing. The probability of drawing, as opposed to having a decisive result, is not specified in the Elo system. Instead, a draw is considered half a win and half a loss. In practice, since the true strength of each player is unknown, the expected scores are calculated using the player's current ratings as follows. If player A has a rating of \, R_\mathsf \, and player B a rating of \, R_\mathsf \,, the exact formula (using the
logistic curve A logistic function or logistic curve is a common S-shaped curve (sigmoid function, sigmoid curve) with equation f(x) = \frac, where For values of x in the domain of real numbers from -\infty to +\infty, the S-curve shown on the right is ...
with
base 10 The decimal numeral system (also called the base-ten positional numeral system and denary or decanary) is the standard system for denoting integer and non-integer numbers. It is the extension to non-integer numbers of the Hindu–Arabic numeral ...
) for the expected score of player A is : E_\mathsf = \frac 1 ~. Similarly the expected score for player B is : E_\mathsf = \frac 1 ~. This could also be expressed by : E_\mathsf = \frac and : E_\mathsf = \frac ~, where \; Q_\mathsf = 10^ \;, and \; Q_\mathsf = 10^ ~. Note that in the latter case, the same denominator applies to both expressions, and it is plain that \; E_\mathsf + E_\mathsf = 1 ~. This means that by studying only the numerators, we find out that the expected score for player A is \; Q_\mathsf/Q_\mathsf \; times greater than the expected score for player B. It then follows that for each 400 rating points of advantage over the opponent, the expected score is magnified ten times in comparison to the opponent's expected score. When a player's actual tournament scores exceed their expected scores, the Elo system takes this as evidence that player's rating is too low, and needs to be adjusted upward. Similarly, when a player's actual tournament scores fall short of their expected scores, that player's rating is adjusted downward. Elo's original suggestion, which is still widely used, was a simple linear adjustment proportional to the amount by which a player over-performed or under-performed their expected score. The maximum possible adjustment per game, called the K-factor, was set at \; K = 16 \; for masters and \; K = 32 \; for weaker players. Suppose player A (again with rating R_\mathsf) was expected to score \, E_\mathsf \, points but actually scored \, S_\mathsf \, points. The formula for updating that player's rating is :R_\mathsf' = R_\mathsf + K \cdot (S_\mathsf - E_\mathsf) ~. This update can be performed after each game or each tournament, or after any suitable rating period. An example may help to clarify: This updating procedure is at the core of the ratings used by
FIDE The International Chess Federation or World Chess Federation, commonly referred to by its French acronym FIDE ( Fédération Internationale des Échecs), is an international organization based in Switzerland that connects the various national c ...
, USCF,
Yahoo! Games Yahoo! Games was a section of the Yahoo! website, launched on March 31, 1998, in which Yahoo! users could play games either with other users or by themselves. The majority of Yahoo! Games was closed down on March 31, 2014 and the balance was clos ...
, the
Internet Chess Club The Internet Chess Club (ICC) is a commercial Internet chess server devoted to the play and discussion of chess and chess variants. ICC had over 30,000 subscribing members in 2005.John Black, Martin Cochran, Martin Ryan Gardner"Lessons Learned ...
(ICC) and the
Free Internet Chess Server The Free Internet Chess Server (FICS) is a volunteer-run Internet chess server. It was organised as a free alternative to the Internet Chess Club (ICC), after that site began charging for membership. History The first Internet chess server, ...
(FICS). However, each organization has taken a different route to deal with the uncertainty inherent in the ratings, particularly the ratings of newcomers, and to deal with the problem of ratings inflation/deflation. New players are assigned provisional ratings, which are adjusted more drastically than established ratings. The principles used in these rating systems can be used for rating other competitions—for instance, international
football Football is a family of team sports that involve, to varying degrees, kicking a ball to score a goal. Unqualified, the word ''football'' normally means the form of football that is the most popular where the word is used. Sports commonly c ...
matches. Elo ratings have also been applied to games without the possibility of
draw Draw, drawing, draws, or drawn may refer to: Common uses * Draw (terrain), a terrain feature formed by two parallel ridges or spurs with low ground in between them * Drawing (manufacturing), a process where metal, glass, or plastic or anything ...
s, and to games in which the result can also have a quantity (small/big margin) in addition to the quality (win/loss). See Go rating with Elo for more.


Most accurate distribution model

The first mathematical concern addressed by the USCF was the use of the
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
. They found that this did not accurately represent the actual results achieved, particularly by the lower rated players. Instead they switched to a
logistic distribution Logistic may refer to: Mathematics * Logistic function, a sigmoid function used in many fields ** Logistic map, a recurrence relation that sometimes exhibits chaos ** Logistic regression, a statistical model using the logistic function ** Logit, ...
model, which the USCF found provided a better fit for the actual results achieved. FIDE also uses an approximation to the logistic distribution.


Most accurate K-factor

The second major concern is the correct "-factor" used. The chess statistician
Jeff Sonas Jeff Sonas is a statistician, statistical chess analyst who invented the Chessmetrics system for rating chess players, which is intended as an improvement on the Elo rating system. He is the founder and proprietor of the Chessmetrics.com website, wh ...
believes that the original \; K = 10 \; value (for players rated above 2400) is inaccurate in Elo's work. If the -factor coefficient is set too large, there will be too much sensitivity to just a few, recent events, in terms of a large number of points exchanged in each game. And if the K-value is too low, the sensitivity will be minimal, and the system will not respond quickly enough to changes in a player's actual level of performance. Elo's original -factor estimation was made without the benefit of huge databases and statistical evidence. Sonas indicates that a -factor of 24 (for players rated above 2400) may be more accurate both as a predictive tool of future performance, and also more sensitive to performance. Certain Internet chess sites seem to avoid a three-level K-factor staggering based on rating range. For example, the ICC seems to adopt a global K=32 except when playing against provisionally rated players. The USCF (which makes use of a
logistic distribution Logistic may refer to: Mathematics * Logistic function, a sigmoid function used in many fields ** Logistic map, a recurrence relation that sometimes exhibits chaos ** Logistic regression, a statistical model using the logistic function ** Logit, ...
as opposed to a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
) formerly staggered the K-factor according to three main rating ranges: : Currently, the USCF uses a formula that calculates the -factor based on factors including the number of games played and the player's rating. The K-factor is also reduced for high rated players if the event has shorter time controls. FIDE uses the following ranges: : FIDE used the following ranges before July 2014: : The gradation of the -factor reduces rating change at the top end of the rating range, reducing the possibility for rapid rise or fall of rating for those with a rating high enough to reach a low -factor. In theory, this might apply equally to online chess players and over-the-board players, since it is more difficult for all players to raise their rating after their rating has become high and their -factor consequently reduced. However, when playing online, 2800+ players can more easily raise their rating by simply selecting opponents with high ratings – on the ICC playing site, a grandmaster may play a string of different opponents who are all rated over 2700. In over-the-board events, it would only be in very high level all-play-all events that a player would be able to engage that number of 2700+ opponents. In a normal, open, Swiss-paired chess tournament, frequently there would be many opponents rated less than 2500, reducing the ratings gains possible from a single contest for a high-rated player.


Formal derivation for win/loss games

The above expressions can be now formally derived by exploiting the link between the Elo rating and the stochastic gradient update in the logistic regression. If we assume that the game results are
binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two digits (0 and 1) * Binary function, a function that takes two arguments * Binary operation, a mathematical operation that t ...
, that is, only a win or a loss can be observed, the problem can be addressed via
logistic regression In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...
, where the games results are
dependent variables Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
, the players' ratings are
independent variables Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...
, and the model relating both is probabilistic: the probability of the player \mathsf winning the game is modeled as : \Pr\ = \sigma(r_), \quad \sigma(r)=\frac 1 , where : r_ = (R_\mathsf - R_\mathsf) denotes the difference of the players' ratings, and we use a scaling factor s=400, and, by
law of total probability In probability theory, the law (or formula) of total probability is a fundamental rule relating marginal probabilities to conditional probabilities. It expresses the total probability of an outcome which can be realized via several distinct even ...
: \Pr\ = 1-\sigma(r_)=\sigma(-r_). The
log loss In information theory, the cross-entropy between two probability distributions p and q over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is ...
is then calculated as : \ell = \begin -\log \sigma(r_\mathsf) & \textrm~ \mathsf~\textrm,\\ -\log \sigma(-r_\mathsf) & \textrm~ \mathsf~\textrm, \end and, using the
stochastic gradient descent Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of ...
the log loss is minimized as follows: : R_\leftarrow R_ - \eta \frac, : R_\leftarrow R_ - \eta \frac. where \eta is the adaptation step. Since \frac\log\sigma(r)=\frac\sigma(-r), \frac=1, and \frac=-1, the adaptation is then written as follows : R_\leftarrow \begin R_ + K \sigma(-r_) & \textrm~\mathsf~\textrm\\ R_ - K \sigma(r_) & \textrm~\mathsf~\textrm, \end which may be compactly written as : R_\leftarrow R_ + K (S_-E_) where K=\eta\log10/s is the new adaptation step which absorbs \eta and s, S_=1 if \mathsf wins and S_=0 if \mathsf wins, and the expected score is given by E_=\sigma(r_). Analogously, the update for the rating R_ is : R_\leftarrow R_ + K (S_-E_).


Formal derivation for win/draw/loss games

Since the very beginning, the Elo rating has been also used in chess where we observe wins, losses or draws and, to deal with the latter a fractional score value, S_=0.5, is introduced. We note, however, that the scores S_=1 and S_=0 are merely indicators to the events when the player \mathsf wins or loses the game. It is, therefore, not immediately clear what is the meaning of the fractional score. Moreover, since we do not specify explicitly the model relating the rating values R_ and R_ to the probability of the game outcome, we cannot say what the probability of the win, the loss, or the draw is. To address these difficulties, and to derive the Elo rating in the ternary games, we will define the explicit probabilistic model of the outcomes. Next, we will minimize the log loss via stochastic gradient. Since the loss, the draw, and the win are ordinal variables, we should adopt the model which takes their ordinal nature into account, and we use the so-called adjacent categories model which may be traced to the Davidson's work : \Pr\ = \sigma(r_; \kappa), : \Pr\ = \sigma(-r_; \kappa), : \Pr\ = \kappa\sqrt, where : \sigma(r; \kappa) = \frac and \kappa\ge 0 is a parameter. Introduction of a free parameter should not be surprising as we have three possible outcomes and thus, an additional degree of freedom should appear in the model. In particular, with \kappa=0 we recover the model underlying the logistic regression : \Pr\ = \sigma(r_;0)=\frac=\frac, where s' = s/2. Using the ordinal model defined above, the
log loss In information theory, the cross-entropy between two probability distributions p and q over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is ...
is now calculated as : \ell = \begin -\log \sigma(r_;\kappa) & \textrm~ \mathsf~\textrm,\\ -\log \sigma(-r_;\kappa) & \textrm~ \mathsf~\textrm,\\ -\log \kappa -\frac\log\sigma(r_;\kappa) - \frac\log\sigma(-r_;\kappa) & \textrm~ \mathsf~\textrm, \end which may be compactly written as : \ell = -(S_ +\fracD)\log \sigma(r_;\kappa) -(S_ +\fracD) \log \sigma(-r_;\kappa) -\log \kappa where S_=1
iff In logic and related fields such as mathematics and philosophy, "if and only if" (shortened as "iff") is a biconditional logical connective between statements, where either both statements are true or both are false. The connective is bicon ...
\mathsf wins, S_=1 iff \mathsf wins, and D=1 iff \mathsf draws. As before, we need the derivative of \log\sigma(r;\kappa) which is given by : \frac\log\sigma(r; \kappa) =\frac -g(r;\kappa) , where : g(r;\kappa)= \frac . Thus, the derivative of the log loss with respect to the rating R_ is given by : \begin \frac\ell &= -\frac \left( (S_ +0.5D) -g(r_;\kappa)-(S_ +0.5D)g(r_;\kappa) \right)\\ &= -\frac \left(S_ + 0.5D-g(r_;\kappa)\right), \end where we used the relationships S_ + S_ + D=1 and g(-r;\kappa)=1-g(r;\kappa) . Then, the stochastic gradient descent applied to minimize the log loss yields the following update for the rating R_ : R_\leftarrow R_ + K (\hat_- g(r_;\kappa)) where K=2\eta\log10/s and \hat_= S_ + 0.5D . Of course, \hat_= 1 if \textsf wins, \hat_= 0.5 if \textsf draws, and \hat_= 0 if \textsf loses. To recognize the origin in the model proposed by Davidson, this update is called a Elo-Davidson rating. The update for R_ is derived in the same manner as : R_\leftarrow R_ + K (\hat_- g(r_;\kappa)) , where r_=R_-R_=-r_ . We note that : \begin E
hat_ A hat is a head covering which is worn for various reasons, including protection against weather conditions, ceremonial reasons such as university graduation, religious reasons, safety, or as a fashion accessory. Hats which incorporate mecha ...
&=\Pr\+0.5\Pr\\\ &=\sigma(r_;\kappa)+0.5\kappa\sqrt\\ &=g(r_;\kappa) \end and thus, we obtain the rating update may be written as : R_\leftarrow R_ + K (\hat_- E_) , where E_=E hat_\mathsf and we obtained practically the same equation as in the Elo rating except that the expected score is given by E_=g(r_;\kappa) instead of E_=\sigma(r_) . Of course, as noted above, for \kappa=0, we have g(r;0) = \sigma(r) and thus, the Elo-Davidson rating is exactly the same as the Elo rating. However, this is of no help to understand the case when the draws are observed (we cannot use \kappa=0 which would mean that the probability of draw is null). On the other hand, if we use \kappa=2 , we have : g(r;2)= \frac =\frac =\sigma(r) which means that, using \kappa=2 , the Elo-Davidson rating is exactly the same as the Elo rating.


Practical issues


Game activity versus protecting one's rating

In some cases the rating system can discourage game activity for players who wish to protect their rating. In order to discourage players from sitting on a high rating, a 2012 proposal by British Grandmaster
John Nunn John Denis Martin Nunn (born 25 April 1955) is an English chess grandmaster, a three-time world champion in chess problem solving, a chess writer and publisher, and a mathematician. He is one of England's strongest chess players and was former ...
for choosing qualifiers to the chess world championship included an activity bonus, to be combined with the rating. Beyond the chess world, concerns over players avoiding competitive play to protect their ratings caused
Wizards of the Coast Wizards of the Coast LLC (often referred to as WotC or simply Wizards) is an American publisher of games, primarily based on fantasy and List of science fiction themes, science fiction themes, and formerly an operator of retail stores for ga ...
to abandon the Elo system for ''
Magic: the Gathering ''Magic: The Gathering'' (colloquially known as ''Magic'' or ''MTG'') is a Tabletop game, tabletop and Digital collectible card game, digital Collectible card game, collectable card game created by Richard Garfield. Released in 1993 by Wizards ...
'' tournaments in favour of a system of their own devising called "Planeswalker Points".


Selective pairing

A more subtle issue is related to pairing. When players can choose their own opponents, they can choose opponents with minimal risk of losing, and maximum reward for winning. Particular examples of players rated 2800+ choosing opponents with minimal risk and maximum possibility of rating gain include: choosing opponents that they know they can beat with a certain strategy; choosing opponents that they think are overrated; or avoiding playing strong players who are rated several hundred points below them, but may hold chess titles such as IM or GM. In the category of choosing overrated opponents, new entrants to the rating system who have played fewer than 50 games are in theory a convenient target as they may be overrated in their provisional rating. The ICC compensates for this issue by assigning a lower K-factor to the established player if they do win against a new rating entrant. The K-factor is actually a function of the number of rated games played by the new entrant. Therefore, Elo ratings online still provide a useful mechanism for providing a rating based on the opponent's rating. Its overall credibility, however, needs to be seen in the context of at least the above two major issues described—engine abuse, and selective pairing of opponents. The ICC has also recently introduced "auto-pairing" ratings which are based on random pairings, but with each win in a row ensuring a statistically much harder opponent who has also won x games in a row. With potentially hundreds of players involved, this creates some of the challenges of a major large Swiss event which is being fiercely contested, with round winners meeting round winners. This approach to pairing certainly maximizes the rating risk of the higher-rated participants, who may face very stiff opposition from players below 3000, for example. This is a separate rating in itself, and is under "1-minute" and "5-minute" rating categories. Maximum ratings achieved over 2500 are exceptionally rare.


Ratings inflation and deflation

The term "inflation", applied to ratings, is meant to suggest that the level of playing strength demonstrated by the rated player is decreasing over time; conversely, "deflation" suggests that the level is advancing. For example, if there is inflation, a modern rating of 2500 means less than a historical rating of 2500, while the reverse is true if there is deflation. Using ratings to compare players between different eras is made more difficult when inflation or deflation are present. (See also
Comparison of top chess players throughout history Several methods have been suggested for comparing the greatest chess players in history. There is agreement on a statistical system to rate the strengths of current players, called the Elo system, but disagreement about methods used to compare pl ...
.) Analyzing FIDE rating lists over time, Jeff Sonas suggests that inflation may have taken place since about 1985. Sonas looks at the highest-rated players, rather than all rated players, and acknowledges that the changes in the distribution of ratings could have been caused by an increase of the standard of play at the highest levels, but looks for other causes as well. The number of people with ratings over 2700 has increased. Around 1979 there was only one active player (
Anatoly Karpov Anatoly Yevgenyevich Karpov ( rus, links=no, Анато́лий Евге́ньевич Ка́рпов, p=ɐnɐˈtolʲɪj jɪvˈɡʲenʲjɪvʲɪtɕ ˈkarpəf; born May 23, 1951) is a Russian and former Soviet chess grandmaster, former World Ches ...
) with a rating this high. In 1992
Viswanathan Anand Viswanathan "Vishy" Anand (born 11 December 1969) is an Indian chess grandmaster and a former five-time World Chess Champion. He became the first grandmaster from India in 1988, and is one of the few players to have surpassed an Elo rating of ...
was only the 8th player in chess history to reach the 2700 mark at that point of time. This increased to 15 players by 1994. 33 players had a 2700+ rating in 2009 and 44 as of September 2012. The current benchmark for elite players lies beyond 2800. One possible cause for this inflation was the rating floor, which for a long time was at 2200, and if a player dropped below this they were struck from the rating list. As a consequence, players at a skill level just below the floor would only be on the rating list if they were overrated, and this would cause them to feed points into the rating pool. In July 2000 the average rating of the top 100 was 2644. By July 2012 it had increased to 2703. Using a strong
chess engine In computer chess, a chess engine is a computer program that analyzes chess or chess variant positions, and generates a move or list of moves that it regards as strongest. A chess engine is usually a back end with a command-line interface wit ...
to evaluate moves played in games between rated players, Regan and Haworth analyze sets of games from FIDE-rated tournaments, and draw the conclusion that there had been little or no inflation from 1976 to 2009. In a pure Elo system, each game ends in an equal transaction of rating points. If the winner gains N rating points, the loser will drop by N rating points. This prevents points from entering or leaving the system when games are played and rated. However, players tend to enter the system as novices with a low rating and retire from the system as experienced players with a high rating. Therefore, in the long run a system with strictly equal transactions tends to result in rating deflation. In 1995, the USCF acknowledged that several young scholastic players were improving faster than the rating system was able to track. As a result, established players with stable ratings started to lose rating points to the young and underrated players. Several of the older established players were frustrated over what they considered an unfair rating decline, and some even quit chess over it.A conversation with Mark Glickma

, Published in ''Chess Life'' October 2006 issue


Combating deflation

Because of the significant difference in timing of when inflation and deflation occur, and in order to combat deflation, most implementations of Elo ratings have a mechanism for injecting points into the system in order to maintain relative ratings over time. FIDE has two inflationary mechanisms. First, performances below a "ratings floor" are not tracked, so a player with true skill below the floor can only be unrated or overrated, never correctly rated. Second, established and higher-rated players have a lower K-factor. New players have a ''K'' = 40, which drops to ''K'' = 20 after 30 played games, and to ''K'' = 10 when the player reaches 2400. The current system in the United States includes a bonus point scheme which feeds rating points into the system in order to track improving players, and different K-values for different players. Some methods, used in Norway for example, differentiate between juniors and seniors, and use a larger K-factor for the young players, even boosting the rating progress by 100% for when they score well above their predicted performance. Rating floors in the United States work by guaranteeing that a player will never drop below a certain limit. This also combats deflation, but the chairman of the USCF Ratings Committee has been critical of this method because it does not feed the extra points to the improving players. A possible motive for these rating floors is to combat sandbagging, i.e., deliberate lowering of ratings to be eligible for lower rating class sections and prizes.


Ratings of computers

Human–computer chess matches This article documents the progress of significant human–computer chess matches. Computer chess, Chess computers were first able to beat strong chess players in the late 1980s. Their most famous success was the victory of Deep Blue (chess compu ...
between 1997 (
Deep Blue versus Garry Kasparov Deep Blue versus Garry Kasparov was a pair of six-game chess matches between the world chess champion Garry Kasparov and an IBM supercomputer called Deep Blue. The first match was played in Philadelphia in 1996 and won by Kasparov by 4–2. A ...
) and 2006 demonstrated that
chess computers In computer chess, a chess engine is a computer program that analyzes chess or List of chess variants, chess variant positions, and generates a move or list of moves that it regards as strongest. A chess engine is usually a Front and back ends, b ...
are capable of defeating even the strongest human players. However,
chess engine In computer chess, a chess engine is a computer program that analyzes chess or chess variant positions, and generates a move or list of moves that it regards as strongest. A chess engine is usually a back end with a command-line interface wit ...
ratings are difficult to quantify, due to variable factors such as the time control and the hardware the program runs on. Published engine rating lists such as
CCRL Computer chess includes both hardware (dedicated computers) and software capable of playing chess. Computer chess provides opportunities for players to practice even in the absence of human opponents, and also provides opportunities for analysi ...
are based on engine-only games on standard hardware configurations and are not directly comparable to FIDE ratings. For some ratings estimates, see Chess engine § Ratings.


Use outside of chess


Other board and card games

* '' Go'': The
European Go Federation The European Go Federation (EGF) is a non-profit organization with the purpose of encouraging, regulating, co-ordinating, and disseminating the playing of the board game Go in Europe. The EGF was founded in 1957, the same year that the inaugural ...
adopted an Elo-based rating system initially pioneered by the Czech Go Federation. * ''
Backgammon Backgammon is a two-player board game played with counters and dice on tables boards. It is the most widespread Western member of the large family of tables games, whose ancestors date back nearly 5,000 years to the regions of Mesopotamia and Pe ...
'': The popular
First Internet Backgammon Server The First Internet Backgammon Server (FIBS) began operating on July 19, 1992, allowing users to play backgammon in real-time against other people. It was hosted on the Internet, and could track player performance using a modified version of the Elo ...
(FIBS) calculates ratings based on a modified Elo system. New players are assigned a rating of 1500, with the best humans and bots rating over 2000. The same formula has been adopted by several other backgammon sites, such as
Play65 Play65 is an online backgammon operator established in 2004 by an Israeli-based company, SkillEmpire, that hosts real-time backgammon games and tournaments. With its client software available in 21 languages, including English, Arabic, Chinese, Da ...
, DailyGammon, GoldToken and VogClub. VogClub sets a new player's rating at 1600. The UK Backgammon Federation uses the FIBS formula for its UK national ratings. * ''
Scrabble ''Scrabble'' is a word game in which two to four players score points by placing tiles, each bearing a single letter, onto a game board divided into a 15×15 grid of squares. The tiles must form words that, in crossword fashion, read left t ...
'': National Scrabble organizations compute normally distributed Elo ratings except in the
United Kingdom The United Kingdom of Great Britain and Northern Ireland, commonly known as the United Kingdom (UK) or Britain, is a country in Europe, off the north-western coast of the continental mainland. It comprises England, Scotland, Wales and North ...
, where a different system is used. The
North American Scrabble Players Association NASPA Games, formerly known as North American Scrabble Players Association (NASPA), is a nonprofit organization founded in 2009 to administer competitive Scrabble tournaments and clubs in North America. It officially took over these activities f ...
has the largest rated population of active members, numbering about 2,000 as of early 2011.
Lexulous Lexulous (formerly Scrabulous) is an online word game based on the commercial board game Scrabble. It is run by an Indian company of the same name on a dedicated website, and is also available within the social networking site Facebook. The Scr ...
also uses the Elo system. * Despite questions of the appropriateness of using the Elo system to rate games in which luck is a factor, trading-card game manufacturers often use Elo ratings for their organized play efforts. The DCI (formerly Duelists' Convocation International) used Elo ratings for tournaments of ''
Magic: The Gathering ''Magic: The Gathering'' (colloquially known as ''Magic'' or ''MTG'') is a Tabletop game, tabletop and Digital collectible card game, digital Collectible card game, collectable card game created by Richard Garfield. Released in 1993 by Wizards ...
'' and other
Wizards of the Coast Wizards of the Coast LLC (often referred to as WotC or simply Wizards) is an American publisher of games, primarily based on fantasy and List of science fiction themes, science fiction themes, and formerly an operator of retail stores for ga ...
games. However, the DCI abandoned this system in 2012 in favor of a new cumulative system of "Planeswalker Points", chiefly because of the above-noted concern that Elo encourages highly rated players to avoid playing to "protect their rating". Pokémon USA uses the Elo system to rank its TCG organized play competitors. Prizes for the top players in various regions included holidays and world championships invites until the 2011–2012 season, where awards were based on a system of Championship Points, their rationale being the same as the DCI's for ''Magic: The Gathering''. Similarly, Decipher, Inc. used the Elo system for its ranked games such as ''
Star Trek Customizable Card Game The ''Star Trek Customizable Card Game'' is an out-of-print collectible card game based on the ''Star Trek'' universe. The name is commonly abbreviated as ''STCCG'' or ''ST:CCG''. It was first introduced in 1994 by '' Decipher, Inc.'', under t ...
'' and '' Star Wars Customizable Card Game''.


Athletic sports

The Elo rating system is used in the chess portion of chess boxing. In order to be eligible for professional chess boxing, one must have an Elo rating of at least 1600, as well as competing in 50 or more matches of amateur boxing or martial arts.
American college football College football (french: Football universitaire) refers to gridiron football played by teams of student athletes. It was through college football play that American football in the United States, American football rules first gained populari ...
used the Elo method as a portion of its
Bowl Championship Series The Bowl Championship Series (BCS) was a selection system that created four or five bowl game match-ups involving eight or ten of the top ranked teams in the NCAA Division I Football Bowl Subdivision (FBS) of American college football, including ...
rating systems from
1998 1998 was designated as the ''International Year of the Ocean''. Events January * January 6 – The '' Lunar Prospector'' spacecraft is launched into orbit around the Moon, and later finds evidence for frozen water, in soil in permanently ...
to
2013 File:2013 Events Collage V2.png, From left, clockwise: Edward Snowden becomes internationally famous for leaking classified NSA wiretapping information; Typhoon Haiyan kills over 6,000 in the Philippines and Southeast Asia; The Dhaka garment fact ...
after which the BCS was replaced by the
College Football Playoff The College Football Playoff (CFP) is an annual postseason knockout invitational tournament to determine a national champion for the National Collegiate Athletic Association (NCAA) Division I Football Bowl Subdivision (FBS), the highest level ...
.
Jeff Sagarin Jeff Sagarin is an American sports statistician known for his development of a method for ranking and rating sports teams in a variety of sports. His ratings have been a regular feature in the ''USA Today'' sports section since 1985, have been use ...
of ''
USA Today ''USA Today'' (stylized in all uppercase) is an American daily middle-market newspaper and news broadcasting company. Founded by Al Neuharth on September 15, 1982, the newspaper operates from Gannett's corporate headquarters in Tysons, Virgini ...
'' publishes team rankings for most American sports, which includes Elo system ratings for college football. The use of rating systems was effectively scrapped with the creation of the College Football Playoff in 2014; participants in the CFP and its associated bowl games are chosen by a selection committee. In other sports, individuals maintain rankings based on the Elo algorithm. These are usually unofficial, not endorsed by the sport's governing body. The
World Football Elo Ratings The World Football Elo Ratings are a ranking system for men's national association football teams that is published by the website eloratings.net. It is based on the Elo rating system but includes modifications to take various football-specific va ...
is an example of the method applied to men's
football Football is a family of team sports that involve, to varying degrees, kicking a ball to score a goal. Unqualified, the word ''football'' normally means the form of football that is the most popular where the word is used. Sports commonly c ...
. In 2006, Elo ratings were adapted for
Major League Baseball Major League Baseball (MLB) is a professional baseball organization and the oldest major professional sports league in the world. MLB is composed of 30 total teams, divided equally between the National League (NL) and the American League (AL), ...
teams by
Nate Silver Nathaniel Read Silver (born January 13, 1978) is an American statistician, writer, and poker player who analyzes baseball (see sabermetrics), basketball, and elections (see psephology). He is the founder and editor-in-chief of ''FiveThirtyEight' ...
, then of
Baseball Prospectus Baseball Prospectus (BP) is an organization that publishes a website, BaseballProspectus.com, devoted to the sabermetric analysis of baseball. BP has a staff of regular columnists and provides advanced statistics as well as player and team perf ...
. Based on this adaptation, both also made Elo-based
Monte Carlo Monte Carlo (; ; french: Monte-Carlo , or colloquially ''Monte-Carl'' ; lij, Munte Carlu ; ) is officially an administrative area of the Principality of Monaco, specifically the ward of Monte Carlo/Spélugues, where the Monte Carlo Casino is ...
simulations of the odds of whether teams will make the playoffs. In 2014, Beyond the Box Score, an
SB Nation ''SB Nation'' (an abbreviation for their full name ''SportsBlogs Nation'') is a sports blogging network owned by Vox Media. It was co-founded by Tyler Bleszinski, Markos Moulitsas, and Jerome Armstrong in 2005. The blog from which the network ...
site, introduced an Elo ranking system for international baseball. In tennis, the Elo-based Universal Tennis Rating (UTR) rates players on a global scale, regardless of age, gender, or nationality. It is the official rating system of major organizations such as the
Intercollegiate Tennis Association The Intercollegiate Tennis Association (ITA) is the governing body and coaches association of college tennis, both an advocate and authority, overseeing men’s and women’s varsity tennis at all levels – NCAA Division I, NCAA Division II, NC ...
and
World TeamTennis World TeamTennis (WTT) is a mixed-gender professional tennis league played with a team format in the United States, which was founded in 1973. The league's season normally takes place in the summer months. Players from the ATP and WTA take a ...
and is frequently used in segments on the
Tennis Channel Tennis Channel is an American sports-oriented digital cable and satellite television network owned by the Sinclair Television Group subsidiary of the Sinclair Broadcast Group. It is devoted to events and other programming related to the game of ...
. The algorithm analyzes more than 8 million match results from over 800,000 tennis players worldwide. On May 8, 2018,
Rafael Nadal Rafael Nadal Parera (, ; born 3 June 1986) is a Spanish professional tennis player. He is currently ranked world No. 2 in singles by the Association of Tennis Professionals (ATP). He has been ranked List of ATP number 1 ranked singles tennis ...
—having won 46 consecutive sets in clay court matches—had a near-perfect clay UTR of 16.42. In
pool Pool may refer to: Water pool * Swimming pool, usually an artificial structure containing a large body of water intended for swimming * Reflecting pool, a shallow pool designed to reflect a structure and its surroundings * Tide pool, a rocky po ...
, an Elo-based system called Fargo Rate is used to rank players in organized amateur and professional competitions. One of the few Elo-based rankings endorsed by a sport's governing body is the
FIFA Women's World Rankings The FIFA Women's World Rankings for Association football, football were introduced in 2003, with the first rankings published on 16 July of that year, as a follow-on to the existing FIFA World Rankings, Men's FIFA World Rankings. They attempt ...
, based on a simplified version of the Elo algorithm, which
FIFA FIFA (; stands for ''Fédération Internationale de Football Association'' ( French), meaning International Association Football Federation ) is the international governing body of association football, beach football and futsal. It was found ...
uses as its official ranking system for national teams in women's football. From the first ranking list after the
2018 FIFA World Cup The 2018 FIFA World Cup was the 21st FIFA World Cup, the quadrennial world championship for men's national Association football, football teams organized by FIFA. It took place in Russia from 14 June to 15 July 2018, after the country was awa ...
, FIFA has used Elo for their
FIFA World Rankings The FIFA Men's World Ranking is a ranking system for men's national teams in association football, led by Brazil . The teams of the men's member nations of FIFA, football's world governing body, are ranked based on their game results with the ...
. In 2015, Nate Silver, editor-in-chief of the statistical commentary website
FiveThirtyEight ''FiveThirtyEight'', sometimes rendered as ''538'', is an American website that focuses on opinion poll analysis, politics, economics, and sports blogging in the United States. The website, which takes its name from the number of electors in th ...
, and Reuben Fischer-Baum produced Elo ratings for every
National Basketball Association The National Basketball Association (NBA) is a professional basketball league in North America. The league is composed of 30 teams (29 in the United States and 1 in Canada) and is one of the major professional sports leagues in the United S ...
team and season through the 2014 season. In 2014 FiveThirtyEight created Elo-based ratings and win-projections for the American professional
National Football League The National Football League (NFL) is a professional American football league that consists of 32 teams, divided equally between the American Football Conference (AFC) and the National Football Conference (NFC). The NFL is one of the ...
. The English
Korfball Korfball ( nl, korfbal) is a ball sport, with similarities to netball and basketball. It is played by two teams of eight players with four female players and four male players in each team. The objective is to throw a ball into a netless bask ...
Association rated teams based on Elo ratings, to determine handicaps for their cup competition for the 2011/12 season. An Elo-based ranking of
National Hockey League The National Hockey League (NHL; french: Ligue nationale de hockey—LNH, ) is a professional ice hockey league in North America comprising 32 teams—25 in the United States and 7 in Canada. It is considered to be the top ranked professional ...
players has been developed. The hockey-Elo metric evaluates a player's overall two-way play: scoring AND defense in both even strength and power-play/penalty-kill situations. Rugbyleagueratings.com uses the Elo rating system to rank international and club
rugby league Rugby league football, commonly known as just rugby league and sometimes football, footy, rugby or league, is a full-contact sport played by two teams of thirteen players on a rectangular field measuring 68 metres (75 yards) wide and 112 ...
teams.


Video games and online games

Many video games use modified Elo systems in competitive gameplay. The
MOBA Multiplayer online battle arena (MOBA) is a subgenre of strategy video games in which two teams of players compete against each other on a predefined battlefield. Each player controls a single character with a set of distinctive abilities that im ...
game
League of Legends ''League of Legends'' (''LoL''), commonly referred to as ''League'', is a 2009 multiplayer online battle arena video game developed and published by Riot Games. Inspired by ''Defense of the Ancients'', a Mod (video games), custom map for War ...
used an Elo rating system prior to the second season of competitive play. The
Esports Esports, short for electronic sports, is a form of competition using video games. Esports often takes the form of organized, multiplayer video game competitions, particularly between professional players, individually or as teams. Although orga ...
game ''
Overwatch ''Overwatch'' is a multimedia franchise centered on a series of online multiplayer first-person shooter (FPS) video games developed by Blizzard Entertainment: ''Overwatch'' released in 2016, and ''Overwatch 2'' released in 2022. Both games fea ...
'', the basis of the unique
Overwatch League The Overwatch League (OWL) is a professional esports league for the video game ''Overwatch'', produced by its developer, Blizzard Entertainment. The Overwatch League follows the model of other traditional North American professional sporting lea ...
professional sports organization, uses a derivative of the Elo system to rank competitive players with various adjustments made between competitive seasons. ''
World of Warcraft ''World of Warcraft'' (''WoW'') is a massively multiplayer online role-playing game (MMORPG) released in 2004 by Blizzard Entertainment. Set in the ''Warcraft'' fantasy universe, ''World of Warcraft'' takes place within the world of Azeroth ...
'' also previously used the Glicko-2 system to team up and compare Arena players, but now uses a system similar to Microsoft's
TrueSkill TrueSkill is a skill-based ranking system developed by Microsoft for use with video game matchmaking on Xbox Live. Unlike the popular Elo rating system, which was initially designed for chess, TrueSkill is designed to support games with more than t ...
. The game ''
Puzzle Pirates ''Puzzle Pirates'' (also known as ''Yohoho! Puzzle Pirates'') is a massively multiplayer online game developed by Three Rings Design (Later owned by Grey Havens LLC). The player takes the role of a pirate, adventuring on the high seas and pillagin ...
'' uses the Elo rating system to determine the standings in the various puzzles. This system is also used in FIFA Mobile for the Division Rivals modes. The browser game ''
Quidditch Manager Quidditch is a fictional sport invented by author J.K. Rowling for her fantasy book series ''Harry Potter''. It first appeared in the novel ''Harry Potter and the Philosopher's Stone'' (1997). It is a dangerous but popular sport played by witc ...
'' uses the Elo rating to measure a team's performance. Another recent game to start using the Elo rating system is ''
AirMech ''AirMech'' is a free-to-play multiplayer online battle arena video game developed and published by Carbon Games for Microsoft Windows, with Android and VR version in the works. Originally released onto Steam's early access program in Novembe ...
'', using Elo ratings for 1v1, 2v2, and 3v3 random/team matchmaking. '' RuneScape 3'' used the Elo system in the rerelease of the bounty hunter minigame in 2016. '' Mechwarrior Online'' instituted an Elo system for its new "Comp Queue" mode, effective with the Jun 20, 2017 patch. '' Age of Empires II DE'' is using the Elo system for its Leaderboard and matchmaking, with new players starting at Elo 1000. Few video games use the original Elo rating system. According to
Lichess Lichess (; ) is a free and open-source Internet chess server run by a non-profit organization of the same name. Users of the site can play online chess anonymously and optionally register an account to play rated games. Lichess is ad-free and al ...
, an online chess server, the Elo system is outdated, with Glicko-2 now being used by many chess organizations. ''PlayerUnknown’s Battlegrounds'' is one of the few video games that utilizes the very first Elo system. In ''
Guild Wars ''Guild Wars'' is an online role-playing game franchise developed by ArenaNet and published by NCSOFT. The games were critically well received and won many editor's choice awards, as well as awards such as Best Value, Best Massively Multiplaye ...
'', Elo ratings are used to record guild rating gained and lost through guild-versus-guild battles. In 1998, an online gaming ladder called ''Clanbase'' was launched, which used the Elo scoring system to rank teams. The initial K-value was 30, but was changed to 5 in January 2007, then changed to 15 in July 2009. The site later went offline in 2013. A similar alternative site was launched in 2016 under the name ''Scrimbase'', which also used the Elo scoring system for ranking teams. Since 2005, '' Golden Tee Live'' has rated players based on the Elo system. New players start at 2100, with top players rating over 3000. Despite many video games using different systems for
matchmaking Matchmaking is the process of matching two or more people together, usually for the purpose of marriage, in which case the matchmaker is also known as a marriage broker. The word is also used in the context of sporting events such as boxing, in ...
, it is common for players of ranked video games to refer to all matchmaking ratings as ''Elo''.


Other usage

The Elo rating system has been used in
soft biometrics Soft Biometrics traits are physical, behavioural or adhered human characteristics, classifiable in pre–defined human compliant categories. These categories are, unlike in the classical biometric case, established and time–proven by humans with t ...
, which concerns the identification of individuals using human descriptions. Comparative descriptions were utilized alongside the Elo rating system to provide robust and discriminative 'relative measurements', permitting accurate identification. The Elo rating system has also been used in biology for assessing male dominance hierarchies, and in automation and computer vision for
fabric inspection Fabric inspection, also known as fabric checking, is a systematic fabric evaluation in which defects are identified. Fabric inspection helps understand quality in terms of color, density, weight, printing, measurement, and other quality criteria pr ...
. Moreover,
online judge Competitive programming is a mind sport usually held over the Internet or a local network, involving participants trying to program according to provided specifications. Contestants are referred to as ''sport programmers''. Competitive progra ...
sites are also using Elo rating system or its derivatives. For example,
Topcoder Topcoder (formerly TopCoder) is a crowdsourcing company with an open global community of designers, developers, data scientists, and competitive programmers. Topcoder pays community members for their work on the projects and sells community s ...
is using a modified version based on normal distribution, while
Codeforces Codeforces is a website that hosts competitive programming contests. It is maintained by a group of competitive programmers from ITMO University led by Mikhail Mirzayanov. Since 2013, Codeforces claims to surpass Topcoder in terms of active co ...
is using another version based on logistic distribution. Elo rating system has also been noted in dating apps, such as in the matchmaking app
Tinder Tinder is easily combustible material used to start a fire. Tinder is a finely divided, open material which will begin to glow under a shower of sparks. Air is gently wafted over the glowing tinder until it bursts into flame. The flaming tinder i ...
, which uses a variant of the Elo rating system.


References in the media

The Elo rating system was featured prominently in ''
The Social Network ''The Social Network'' is a 2010 American biographical drama film directed by David Fincher and written by Aaron Sorkin, based on the 2009 book ''The Accidental Billionaires'' by Ben Mezrich. It portrays the founding of social networking web ...
'' during the algorithm scene where
Mark Zuckerberg Mark Elliot Zuckerberg (; born ) is an American business magnate, internet entrepreneur, and philanthropist. He is known for co-founding the social media website Facebook and its parent company Meta Platforms (formerly Facebook, Inc.), o ...
released
Facemash Facebook is a social networking service originally launched as TheFacebook on February 4, 2004. It was founded by Mark Zuckerberg and college roommates and fellow Harvard University students, in particular Eduardo Saverin, Andrew McCollum, ...
. In the scene
Eduardo Saverin Eduardo Luiz Saverin (; ; born March 19, 1982) is a Brazilian billionaire entrepreneur and angel investor based in Singapore. Saverin is one of the co-founders of Facebook. In 2012, he owned 53 million Facebook shares (approximately 2% of all ou ...
writes mathematical formulas for the Elo rating system on Zuckerberg's dormitory room window. Behind the scenes, the movie claims, the Elo system is employed to rank girls by their attractiveness. The equations driving the algorithm are shown briefly, written on the window;Screenplay for ''The Social Network'', Sony Pictures
, p. 16
however, they are slightly incorrect.


See also

*
Bradley–Terry model The Bradley–Terry model is a probability model that can predict the outcome of a paired comparison. Given a pair of individuals and drawn from some population, it estimates the probability that the pairwise comparison turns out true, as :P(i > ...
*
Chess rating system A chess rating system is a system used in chess to estimate the strength of a player, based on their performance versus other players. They are used by organizations such as FIDE, the US Chess Federation (USCF or US Chess), International Correspond ...
, other chess rating systems *
Elo hell Elo hell (also known as MMR hell) is a video gaming term used in MOBAs and other multiplayer online games with competitive modes. It refers to portions of the matchmaking ranking spectrum where individual matches are of poor quality, and are oft ...
*
Glicko rating system The Glicko rating system and Glicko-2 rating system are methods of assessing a player's strength in games of skill, such as chess and Go. The Glicko rating system was invented by Mark Glickman in 1995 as an improvement on the Elo rating system, a ...
, the rating methods developed by Mark Glickman


Notes


References


Notes


Sources

*


Further reading

*


External links


Mark Glickman's research page, with a number of links to technical papers on chess rating systems
{{Chess Chess rating systems Sports records and statistics